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[57] ABSTRACT 
A computer token identification system uses a plurality 
of unique tokens to represent a plurality of items. The 
token architecture consists of a delimiter field, a version 
field, and a variable field. The delimiter field contains at 
least one token recognition character. The version field 
inunediately follows the delimiter field and contains the 
version string of at least one character identifying a 
unique token version. The variable field immediately 
follows the version field and contains a variable string 
of at least one character conforming to a format specifi- 
cation for the token version. Each variable string is 
unique for a token version. The version string and vari- 
able string can be of varying lengths, and the characters 
of the version string and variable string that are adja- 
cent are from different character set types. 
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COMPUTERIZED SYSTEM FOR REPRESENTING 
DATA ITEMS USING TOKEN IDENTIHERS 

FIELD OF THE INVENTION 5 

This invention relates to computerized systems for 
organizing and representing data items, and more par- 
ticularly to token identifier systems used to identify data 
instances. 

10 

BACKGROUND OF THE INVENTION 

Numerous token or tag identification systems or 
schemes are widely used to represent different types of 
objects, events and locations. Examples of identification 
schemes commonly used today are license plate identi- 
fiers for cars, social security numbers for people, and 
Library of Congress identifiers for books. Computer 
systems use tokens or tags to identify everything from 
system messages to system user input. Computer net- 
working systems use tokens to identify system users and 20 
error messages on a network. Computerized database 
systems use tokens for identifying and retrieving data 
items stored on computer storage areas. 

Token identifiers are particularly useful for classify- 
ing objects stored in computerized database systems. 25 
Unique token identifiers are used as candidate keys for 
retrieving from a computerized database, information 
relating to items or instances identified by the tokens. 

There is a particular need for a token identification 
scheme that can be used with computer databases that 30 
are interconnected worldwide and require unique iden- 
tifiers across a large name space supporting parallel 
independent assignment without conflicts which re- 
quire the introduction of conflict resolution schemes. 

Computer network systems have grown to be larger 35 
and more interconnected. When network interconnec- 
tions were limited, objects, such as resources, applica- 
tions, or devices, in the networks were named accord- 
ing to different proprietary schemes. Now, the existing 
naming spaces which are limited to specific sizes and 40 
scopes, are unable to cope with growti and increasing 
user needs. Furthermore, cooperative efforts rather 
than proprietary efforts are needed in order to provide 
compatibility. 

When the network systems use the same format for 45 
token identifiers there can be problems with systems 
generating the same tokens which eliminates the token 
uniqueness. When the systems use a different format for 
the token identifiers, one system may not be able iden- 
tify or make use of the other systems* tokens. 50 

An example of a proposed identification scheme is 
described in Zatti, S, Ashfield, J, Baker, J, and Miller, 
E., Naming and Registration for IBM Distributed Sys- 
tems, IBM Systems Journal, vol 31, no. 2, 1992, pp 
353-380. The proposed scheme conforms to standards 55 
criteria. The international standards organizations 
(ISO) reference mode! inUoduces the concept of Appli- 
cation Entity Titie (AET). The AET is a high-level 
identifier that allows applications and users to denote a 
component of an application that performs communica- 60 
tion functions. An AET is mapped by an application 
layer directory to low level addressing information used 
by the Association Control Service Element to estabhsh 
associations. The International Organization for Stand- 
ardization-Consultative- Committee on International 65 
Telegraph and Telephone defined "Distinguished 
Names" which provides a syntax for the identifier. The 
Zatti scheme provides an extensible, global naming 
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scheme. However, the approach is complicated and 
allows for conflicts which must be resolved using a 
conflict resolution scheme. Also, the Zatti approach 
does not provide an algorithm for the automated gener- 
ation of unique tokens. 

In general, the token identification schemes currentiy 
used are inadequate due to limitations on the use of the 
tokens and other problems inherent within the schemes. 

Predominantly, the token identification schemes cur- 
rently being used are directed towards providing tokens 
for a particular group objects for a single purpose. The 
telephone numbering scheme is only used for identify- 
ing telephones and the Library of Congress numbering 
scheme is only used for identifying documents. These 
schemes are not designed to be used for more than one 
type of object. The Universal Product Codes (UPC) 
provide an attempt at universality. However the codes 
are limited to use on products and have the problem of 
not being extensible. 

A number of identification schemes currently being 
used rely on time or location to provide a mechanism 
for uniquely representing different items. However 
these schemes have consistency problems that hinder 
their use. Schemes that use tokens based on time have 
problems with consistency due to clocks not always 
being synchronized, granularity, and time zone differ- 
ences. Schemes that use nodes at a location for uniquely 
identifying objects have consistency problems since the 
nodes can be moved and multiple servers can exist at 
one location. 

Current schemes also have problems with extensibil- 
ity when the token has a fixed length. A token identifi- 
cation scheme can become obsolete when a maximum 
token size is required since the pool of items being rep- 
resented by the tokens can grow beyond expectations. 
For example, in a network system where network users 
are identified by fixed length tokens, if more users join 
the network system than originally anticipated, there 
can be a problem with providing enough unique tokens 
to distinguish and identify each user. 

There are problems associated with identification 
schemes that are tied to a specific code set. There are 
many variations for code sets which limits a code set 
from being compatible on all systems. For example, 
there can be differences for the length of bytes. There 
are also differences from one country to the next. There 
are also some applications which require identifiers 
consisting of only printable characters that can be used 
by people and other applications where it is more effi- 
cient to use non-printable characters. 

Current identification schemes also have migration 
problems. The token architecture may have to be ex- 
panded or altered due to unforeseen requirements. 
There is a need to allow for multiple versions of token 
identifiers that are orthogonal and can coexist well. 
Otherwise there is the problem of having to perform 
conversions or migrations and fallbacks, which can be 
difficult. 

There is a need for unique standardized identifiers to 
tie computer database references to real world objects 
and events using a simple symbolic representation al- 
lowing easy identification management and association. 
In particular, there is a need for a token identification 
system to supply universally unique identifiers that are 
extensible, object independent and support parallel as- 
signment from an arbitrarily large number of servers 
with uniqueness maintained. 
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SUMMARY OF THE INVENTION Tlt'^'^Kl"!' '"PP°" °' ^f. ^"^ '''""^ f'TT' 

set. The architecture can be apphed to any coded char- 

An object of the invention is to provide a computer- acter set using an arbitrary number of bits to represent 

ized token identification system for uniquely represent- each character. The representation (bit values) of the 

ingaplurahty of items. The system comprises a plural- 5 characters can optionally be chosen to be printable 

ity of unique tokens, computer generation means for codepoints. 

generating the unique tokens, computer assignment A verification procedure is incorporated in the arch i- 

means for associating a token with an item, computer lecture by using an uncommon recognition character 

recognition means for recognizing a token, and com- followed by a version string of characters from a stipu- 

puter Identification means for identifying the item rep- 10 i^ted character set type. All tokens for all versions can 

resented by the unique token. While the system pro- ^^rified as tokens as long as they have the specified 

vides association means and identification means there combination for the recognition character and version 

is no requirement that the tokens generated by the sys- identifier 

tern are ever associated with an item or instance. The u * • -*u i j • * n 

r^u * 1 urn .1 ^ Each version is orthogonal and can coexist well, 

requester of the token has full control over token usage. 15 • j n * u - * r 

An advantage of the present invention is the component ^I'^^^^^ing the need of later having to perform conver^ 

parts of names are generated in a simple, sequential sions or migrations and fallbacks, whi^^ 

series (although any non-repeating series could be The support of niultiple versions provides flexibility 

used). This allows objects to be named without human f ^ longevity to the token architectureand the system 

intervention. Notions such as location, content, content 20 computerized token identification. The token archi- 

type and time can be tracked separately and associated ^^^^"'"^ ^^"^ expanded and tailored for future 

with a token as needed. unforeseen needs. 

The tokens comprise a delimiter field containing a version of the unique tokens, the variable field 

token recognition character, a version field following comprises a generator identifier field, a verification 

the delimiter field containing a version string of at least 25 ^ identification field, 

one character which is used to identify a unique token The generator identifier field is adjacent to the verifi- 

version, and a variable field adjacent the version field cation field and contains a generator identifier string of 

containing a variable string of at least one character at least one character. The generator identifier string 

conforming to a format specification for the unique identifies the generator of the unique token. The verifi- 

token version, where each variable string is unique for a 30 cation field contains a verification character used to 

unique token version. verify the correctness of the unique token. The use of a 

In one form of the invention, the version field is adja- verification character allows the generated tokens to 

cent the delimiter field, the version string and variable employ a self-checking feature to enhance integrity and 

string are of varying lengths and the characters of the increase rehability. The identification field is adjacent to 

version string and variable string that are adjacent are 35 the verification field and contains an identification 

from different character set types. string of at least one character. The identification string 

In the preferred embodiment, the token identifies a is unique for a unique token generator, 
version number, the size of which is virtually unlimited. in the preferred embodiment, the generator identifier 
The use of a version number allows the support of mul- string and the identification string are varying lengths, 
tiple, simultaneous formats within the body of the sec- 40 The verification field differs from the generator identi- 
tion of the token in the variable field. The version num- fier field and is of fixed length. The token is delimited by 
ber precisely determines the mapping to be used for the a character from a different character set type from the 
vanable field. Each version is orthogonal and can coex- identification string character type, 
ist well, elmunating the need to do later conversionsor ^n object of the invention to provide a general 
migrations and fallbacks, which can be difficult. The 45 ^ 3 generating unique tokens (also re- 
support of multiple versions provid^ flexibility and f^^^^ ^ j,^^^,^^) ^^^^ extensible and 
longevity to the token architecture and the system for ^^^j^, assignment from an arbitrarily large 
tr^.f ^ «^cation. The token architec J^^^ ^^^^^ uniqueness maintained. 

foJese^ n^Ss 50 ^ ^"^^^^ ^^ '^^ P^°^'^^ ^ 

The token architecture is a simple character string token architecture where there are vailing length fields 
structure. The tokens can consist of all printable charac ^" ^^^^^ P^°^^^^ ^ ^"^"^^^ boundless naming do- 
ters making it suitable for imbedding in interactive soft- ^ ^ , . ^ , . 

ware applications. The architected tokens can support }^ '\ ^ f"^^^^. o^J^^t invention to provide a 

or use any binary character sequence. The token archi- 55 ^^^^^^ identification system usmg tokens that are inde- 
tecture can be applied to any coded character set using pendent of and can be applied to any object, event or 
an arbitrary number of bits to represent each character. location, such as persons, books, videos, license plates. 
The representation bit values of the characters used can telephone numbers, addresses, the borrowing of a book, 
optionally be chosen to be printable codepoints. There ^ phone call, altitude, a date or time. The tokens can 

can be some versions that are designed to be printable 60 ^^o be used to identify instances of objects, 
while other versions can coexist which are completely It is a further object of the invention to provide a 
binary codings used to maximize the density of the token identification system using a token architecture 
name space and minimize tag length. that has no internal hierarchy, complex order, location, 

The token identification system can be used in both or time basis, 

new and existing software applications. The token ar- 65 A further object of the invention is to provide a token 
chitecture can be used to generate identifiers used in identification system where the generation of the token 
current systems providing a fast and simple method for uses rules that are simple, encapsulating the minimum 
generating tokens. information which is not related to object or events. 
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BRIEF DESCRIPTION OF THE DRAWINGS SrfbJd in^ttr deSetoTwhefa^^bTerllsJd 

FIG. 1 is a schematic diagram of a computer system to associate a token with a description of the repre- 
capable of generating and processing unique token iden- sented item, the recognition means consists of using the 
tifiers; 5 tokens as candidate keys to locate the table entry con- 

FIG. 2 is a diagram of a token architecture; taining a description of the item represented by that 

FIG, 3 is a diagram of a preferred embodiment of the token. This would be performed by the CPU imple- 
token architecture of FIG. 1; menting a set of instructions to search a table stored in 

FIG. 4 is an embodiment of a version of the token computer memory or external storage, 
architectures of FIG. 3; 10 Methods of implementing the generation of tokens 

FIG. 5 is an example of a token using the version (according to the architecture to be described below), 
architecture of FIG. 4; and the assignment of tokens using tables, the recognition of 

FIG. 6 is a diagram showing a system using the token a token and the identification of the item represented by 
identification system. the token using tables are well known to those skilled in 

15 the art. 

DESCRIPTION OF PREFERRED EMBODIMENT Referring to FIG. 2, the unique token architecture 30 

FIG. 1 shows a computer system 10 providing a is a simple character string or byte array structure coni- 
tokeh identification system. The computer system 10 prising a sequence of concatenated fields 31. The archi- 
includes a processor (CPU) 12, memory 14 and a tenni- tecture comprises a delimiter field 32, a version field 34. 
nal 16 by which a computer user can interact with the 20 and a variable field 36 which is the body of the token, 
system. The computer system 10 consists of a plurality The tokens 30 begin with the token delimiter field 32 
of data processors which are all in communication. The which provides a means by which a recognidon pro- 
computers can each be running a database management gram can recognize the start of a token. The delimiter 
system (DBMS) which organizes data stored on a data- field 32 comprises a token recognition character 33. 
base. The data is stored on an external storage device 18 25 The character used as the delimiter could be any 
such as a direct access storage device (DASD). arbitrary character but it needs to be unique enough so 

There are a number of processes which the CPU as to be able to signal the start of a token. The recogni- 
tracks and controls including displaying system mes- tion delimiter character should have the characteristic 
sages, identifying system users and organizing and re- of being a character symbol that is relatively uncom- 
trieving data in the database. Tokens are used as labels 30 mon in text so that it can be used when parsing text 
identifying each process for storage organizational pur- strings to identify tokens that are contained within the 
poses. text or program to represent another item. The recogni- 

The CPU 12 of the system 10 runs a series of com- tion delimiter character should be a character symbol 
mands to generate unique token identifiers which can be available in most computer coded character sets, such 
assigned to a variety of items including the tracking of 35 as ASCII or EBCDIC. The recognition delimiter char- 
internal system processes. The token identification sys- acter should be operating system neutral, avoiding se- 
tem can also be used by a system user at a terminal 16 quences such as blanks that may be automatically dis- 
for assigning token identifiers to a variety of objects or carded in some computer systems. It is advantageous 
events for storage in a database. for the recognition character to be visible to the human 

The system used for generating the unique tokens 40 eye in printable and common coded character sets. The 
does not have to be in a network system or be running recognition character delimiter choice should also be 
a database program. Such a system is shown as an exam- common across the universe of connected systems. In 
pie of a type of system that could use the token identifi- that way, any system that generates this tag can be 
cation system. The computer system for generating correctly interpreted by any other system as being a 
tokens, associating tokens with instances, recognizing 45 generated unique token for the token identification sys- 
tokens, and identifying the instance identified by a to- tem. 

ken. may each be separate computer systems, where There is no character which meets all of these re- 
such systems each have a CPU and memory or other quirements on all systems and in all character sets. In 
storage device. the preferred embodiment (see FIG. 3). the character 

The computerized token identification system uses 50 symbol "<" is used as the token recognition character 
unique tokens to represent a plurality of items. The to delimit the start of a token. The "<" symbol is rela- 
computer system 10 runs a program consisting of a tively uncommon in text and programming languages, 
number of commands implemented by the CPU to gen- The '*<" symbol is also visible to the eye and is avail- 
erate the imique tokens. The computer system also has able in most printable and common coded character 
an assignment capability in which each token is associ- 55 sets. Therefore, the "<" symbol is an acceptable token 
ated with an item that it represents. The token and recognition character. 

description of the represented item are stored in the The recognition character serves as an "eye- 
computer memory or on the DASD. In the preferred catcher"; that is. a human (and machine) readable initial 
embodiment, the association of the token with the item delimiter for tokens. Of course, the recognition charac- 
can be implemented by the CPU using a table in which 60 ter has been, and will continue to be, used in contexts 
the token is a candidate key for a table entry providing that have nothing to do with tokens, but when people or 
a description of the stored item. programs scan text for tokens, candidate tokens are 

The computer system also contains recognition easily identified. In control structures, databases and 
means for recognizing a token and identification means communications protocols, fields containing tokens will 
for identifying the item represented by the unique to- 65 typically be known to contain tokens. In such cases, the 
ken. A parsing procedure can be used to recognize the recognition character can still be useful for debugging, 
token architecture when parsing text strings or program The token version field 34 inunediately foUows the 
strings to identify tokens that are contained within the delimiter field 32. The version field 34 contains a ver- 
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sion string 35 of at least one character. The version termination of the version string 44 is identified by the 

string 35 identifies a unique token version. The token occurrence of a non-numeric character, 

version identifies a predefined format for parsing and The use of numeric characters for the version identi- 

mapping the remaining sections of the token referred to fier has the advantage of being recognized world wide, 

as the body or the variable field 36. 5 The interna] representation for a numeric value in the 

The variable field 36 immediately follows the version code sets of different languages are the same on most 

field and contains a variable string 37 of at least one ' systems. 

character. The variable string 37 conforms to a format Referring to FIG. 4, a token architecture 49 of a 

specification for the unique token version specified by particular version of the token architecture 40 of FIG. 

the version string 35. The variable string is unique for a 10 3 is shown. The value for the version string 44 is a "I" 

token version and thus provides the unique identifica- (item 50) indicating the token architecture is the first 

tion for each token. The version number and the van- version for the token architecture of FIGS. 2 and 3. 

able field can be used as a candidate key to entries in a preferred embodiment of version "1" of the 

table. token architecture, the variable field 36 consists of a 

The use of a unique token version allows the support generator identifier field 52, a verification field 54 and 

of multiple, simultaneous formats within the body of the ^ identification field 56. The version dependent body 

section of the tag in the variable field 36. The token sections format is dependent on the rules defined for the 

version precisely determines the mapping to be used for ^^dy sections for version 1 provide imple- 

the variable field 36. During the token identification mentation specifications and an algonthm for generat- 

process, the version is identified and then a table or ^0 mgtokens in the version 1 format 

other device is used to provide the format for interpret- . idenUfier field 52 »s adja^nt the ver- 

ing the remaining fields in order to use the information ^'^^^ ^' ^.l generator identifier field 52 contams a 

contained in those fields, generator identifier stnng 53 of at least one character 

Each version is orthogonal and can coexist well, „ TTie generator identifier stnng 53 spea^^ 

i ♦ u • » -r 25 the token. That is, the generator field 52 identifies 

eliminating the need of later having to perform conver- , . , , \ - w , . * 

. ^. J r iiu 1 i_- t. i_ j «- 1. which computer system, in a multiple computer system, 

sions or migrations and fallbacks, which can be difficult. * j a. • / 1 t-u <■ * c *u \ 

-n^ ^ r 1** 1 • J ^ M_ i- generated the unique token. The format of the token 

The support of multiple versions provides flexibility l^^^^,^^ identification string is a variable length, up- 

and longevity to the token architectureand the system ^^^^ alphabetic field using letters A-Z (item 

for computenzed token identification. The token archi- 3^ ^J^ tei^ninated by the occurrence of a non- 

tecture can tiius be expanded and tailored for future uppercase alphabetic character 62. 

unforeseen needs. computer system instance capable of creating 

The version string 35 can be virtually unlimited in ^^1,^^ ^ ^ specified token version and has an as- 
size. This allows for an indeterminate number of unique ^^^^^^ loyally unique token generator identifier. The 
token versions that be implemented The unique 35 generator identifier can be assigned by a central admin- 
token vereion is identified by parsing the token from the jst^ator for the system and stored in a central registry 
pomt of the delimiter until a character from a different (guch as a generator identity table) stored in a database 
character set from that allowed for the version string is accessible by the computers in the computer system, 
reached. For example, if the token version is a sequence xhe specified body format for the version is also regis- 
of characters in the set of numenc characters (0 ... 9), 40 tered in the central registry facility. A query facUity is 
the token version rmght end when the next character is yggd to access the information on the format and ver- 
m the set of alphabetic characters (A . . . Z). The token sion specific information registered in the facility, 
version (and any other fields) could also be delimited as Adjacent the generator identifier field is the verifica- 
a set of characters with cardinality 1; that is, an exphcit tion field 54. The verification field contains a one char- 
delimiter character as a comma. The character from the 45 acter verification character 55 used as a check digit, 
different character set type starts the variable string 37. xhe check digit is provided to insure that the token has 
The variable string is also of varying lengths. The end not been corrupted during generation or subsequent 
of the variable field (which is the end of the token) is handling. The verification system is an internal consis- 
identified by reaching a character 38 not in the charac- tency check which is inexpensive in terms of CPU time 
ter set defined for the token variable field. 50 and memory requirements; yet it increases the integrity 

The version identifier directly following the token of the data. At any time, the validity of a token may be 

recognition character provides a verification system for verified by recomputing and validating the check digit, 

the tokens. A string can be identified and verified as a There are many algorithms known to those skilled in 

being token by determining that the string begins with this field that can be used to provide a verification sys- 

the uncommon character used for the token recognition 55 tem. 

character followed by a character from the specified In the preferred embodiment, the algorithm is as 
character set type for the version identifier. follows. A value representing each character from all 
Referring to FIG. 3, a token architecture 40 repre- sections after the initial delimiter is mapped to its corre- 
senting a preferred embodiment of the token architec- sponding value. Characters within the 0 through 9 
ture 30 of FIG. 2 is shown. In the delimiter field 32, the 60 range are mapped correspondingly to the numeric val- 
character ** <" 42 provides the delimiter for the token. ues 0 through 9. The letters A through Z are mapped to 
Adjacent the delimiter field 32 is the version field 34. corresponding values in the range of 10 through 35. 
The version field 34 contains a version string 44 of These values are totaled. For the purpose of computa- 
numeric character values, where each character in the tion, the value of the check digit position is taken to be 
version string is in the set numerals 0 through 9. The 65 zero. The number of total characters in the key after the 
version string 44 is varying lengths. The characters of initial delimiter including the check digit length are 
the version string 44 and variable string 37 that are added to the sum for the value representing each char- 
adjacent are from different character set types 47. The acter. The sum is divided by ten and if the remainder is 
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zero, then the check digit is zero. If the remainder is not 
equal to zero, then the check digit is the difference of 
ten and the remainder. Other algorithms can be easily 
substituted to produce a check digit or other verifica- 
tion character which are well known to those skilled in 5 
the art. The verification character is placed in the verifi- 
cation field following the generator identifier string. 
The placement of the verification character in the token 
can also be varied. 

The algorithm selected favors very inexpensive 10 
check digit validation over generation cost. In this em- 
bodiment, the check digit can be verified by simply 
adding up the character values (using the mapping de- 
scribed above) of all of the characters in the token (in- 
cluding the check digit) and the number of characters in 15 
the token (excluding the recognition character). The 
remainder of this sum and **I0" will always be 0 for 
valid tokens. Using this scheme, it is necessary to parse 
the token and explicitly find the check digit. 

The self-checking feature enhances integrity and 20 
increases reliability. The checking feature consists of all 
printable characters thus making it suitable for imbed- 
ding in interactive software applications. 

Adjacent the verification field 54 is the identification 
field 56. The identification field 56 contains an identifi- 25 
cation string 57 of at least one character. The identifica- 
tion string 57 provides a unique value for a version and 
generator. The combination of the version string 50, 
generator identifier string 53, and identification string 
57 are universally unique within the computer systems 30 
using the token identification system. 

The format of the identification string 57 is a varying 
length string of uppercase, English alphabetic character 
letters A through Z, and numeric digits 0 through 9. 
The field (and token) is terminated by the occurrence of 35 
a non-uppercase alphabetic, non-numeric character. An 
example of a termination character for the token would 
be a blank space. 

In a preferred embodiment, the identification string is 
generated so as to increase one at a time (sometimes 40 
known in the industry as monotonically increasing). 
The token termination is delimited by the first occur- 
rence of a character outside the domain of the charac- 
ters allowed in the preceding token field. In the version 
1 token architecture, the last token field is the identifica- 45 
tion field 56 containing characten in the character set 
type range of capital letters A though Z and numeric 
characters 0 through 9. Therefore, when parsing text 
strings to identify tokens, the presence of a character 
outside the range specified for the identification field SO 
indicates that the token is terminated. 

The generator identifier string and the identification 
string are varying lengths. The fields of the token do not 
require delimiters due to the use of mutually exclusive 
character ranges. In version 1, the ranges used are nu- 55 
meric values 0-9, uppercase alphabetic characters A-2. 
Alternatively, exclusive delimiters of different lengths 
could be implemented. However, the preferred embodi- 
ment provides a shorter, more compact token. While 
specific character ranges and symbols have been speci- 60 
fied, alternative ranges such as lowercase alphabetics or 
different initial and fmal delimiting characters can also 
be implemented. 

The tokens can support or use any binary character 
set. The architecture can be applied to any coded char- 65 
acter set using an arbitrary number of bits to represent 
each character. The representation (bit values) of the 
characters can optionally be chosen to be printable 



codepoints. In the version 1 architecture, the tokens are 
specifically designed to be printable in ASCII or EBC- 
DIC. The printability of the tokens is a convenience 
implemented in version I to facilitate direct human 
interactions and manipulations such as imbedding the 
tags in text as references, debugging, etc. In other ver- 
sions where human ease of use considerations are not 
applicable, the full range of binary coding may be used 
to maximize the density of the space and minimize the 
length of the token. 

The set of homogeneous identifiers can be used to 
persistently and uniquely identify virtually anything. 
The token architecture generates enough name space to 
sufficiently potentially replace existing naming schemes 
such as phone numbers, Library of Congress numbers, 
geographic coordinates, UPC codes, and date and time 
stamps. The tokens are independent of and can be ap- 
plied to any object (person, book, video, license plate, 
telephone number or address), event (past, present, 
future, borrowing of a book, a phone call), or location 
(longitude, latitude, altitude, date, time). They can also 
be used to identify instances of objects. The token archi- 
tecture has no internal hierarchy, complex order, loca- 
tion, or time basis. The generation rules are simple and 
encapsulate the minimum information needed to iden- 
tify items using the tokens. The tokens are not object or 
even related by the data. 

Referring to FIG. 5, a token 70 conforming to the 
specifications of the version 1 token architecture 49 of 
FIG. 4 is shown. The token has the value < 1 AA40. 
The 72 is the delimiter character. The "1" 74 indi- 
cates that the token 70 is a version 1 token. The "AA" 
76 indicates the token 70 was generated by a system 
generator designated as **AA". The identification string 
57 has a value of "0" 78, The verification character 
(check digit) 55 has a value of "4" 80, which is derived 
from the algorithm explained in detail above 
(10-((H-10 ("A")-MO ("A")+0+5 (the number of 
digits)) mod 10)). 

The tokens are generated by concatenating into a 
string format, the token recognition character, the ver- 
sion string identifying the token version and the vari- 
able string that is unique within a version and conforms 
to a format specified for that version. In the preferred 
embodiment of the version 1 tokens, the concatenated 
fields of the variable field are the generator identifier 
string identifying the token generator, the verification 
character generated by the algorithm described above, 
and an identification string that is unique for a token 
generator. 

The tokens are associated with represented items 
using a table having the tokens as candidate key to 
entries providing a description of the represented item. 
The tokens can be assigned automatically by the com- 
puter system or manually input by a system user to 
identify virtually any item or occurrence. 

The tokens are recognized when parsing a text area 
using the token recognition character. The token ver- 
sion is then identified and a table provides the specifica- 
tion for the variable field. The end of the token is deter- 
mined during the parsing procedure by the presence of 
a character from a character set type that is different 
from the character set type called for by the version 
format for the last field of the token. 

The items represented by the tokens are identified 
using the tokens as candidate keys for tables containing 
descriptions and related infonnation about the repre- 
sented items. 
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The concatenation and details of implementing the 
generation of the tokens, the implementation of the 
association of tokens with items, the implementation of 
the token recognition process, and the implementation 
of the identification process are well known to those 5 
skilled in this field. 

Referring to FIG. 6, use of the token identification 
system is shown in the context of a computerized text 
processor 90. Tokens 92 are used to represent further 
descriptions to be imbedded in the text 94. 10 

A table 96 is used to store the tokens in conjunction 
with a description of the item represented 98. The token 
identification table 96 has as one column 100 the token 
values 92 and as another column 102 the descriptions of 
the items represented by the tokens 98. 15 

Tokens are used throughout the text area 94. When 
the text 94 is parsed, the candidate tokens 92 are recog- 
nized based on the token recognition character 104. 
Once the token is recognized as a token, the token itself 
is parsed to identify the version and the remaining infor- 20 
mation unique to that version such as the termination of 
the token. The version 1 tokens can be verified using the 
check digit. The tokens are then used as a key to find an 
entry in the token identiHcation table where a descrip- 
tion of the item represented by the token is found. In a 25 
preferred embodiment, the generator identifier string 
identifies the generator which generated the token and 
a token identification table for one or more generators is 
used to identify the related information. The token itself 
can also be used to directly represent an object. 30 

The token identification system provides a scheme to 
supply universally unique token identifiers that are infi- 
nitely extensible thus providing enough capacity to 
identify a large number of items. The tokens are object 
independent, homogeneous, persistent, and without 35 
ambiguity. The token architecture supports parallel 
assignment from an arbitrarily large number of servers 
while maintaining uniqueness. 

The identification system overcomes many the prob- 
lems of the existing identification systems. The system 40 
provides flexibility for the format of the token since all 
that is required of the token identifiers is that the token 
begin with a token recognition character followed by a 
version identifier. Tokens in the required format can be 
generated and processed by the token identification 4S 
system. Therefore, other identifications schemes can be 
easily adapted to conform to this system. For example, 
all social security numbers can altered to start with the 
token recognition character and the social security 
scheme can be given a version number that would be 50 
placed after the token recognition character. The vari- 
able field for the token would contain the nine digit 
social security number and then would become part of 
this system. Any existing system can be given a version 
separate number unique for that identification system, 55 
the identifiers can be changed to start with a token 
recognition character and the variable field would con- 
tain identifier strings in the format of the other existing 
systems. This identification system also provides flexi- 
bility by not being limited to only one specific type of 60 
object or instance. The same system can be used for 
almost anything identified by an existing token identifi- 
cation system. 

Further flexibility is provided by the extensibility of 
the tokens. There is no limit to the size of the tokens or 65 
the number of versions of tokens (except those imposed 
by limits in computer architecture and memory capabil- 
ities). The version field and variable field are both of 
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varying lengths. The fields are delimited by having the 
characters at the end of the field be from a different 
character set type from characters at the start of the 
adjacent field. In that way, when a token is being 
parsed, the field is delimited by the change in character 
set types rather than having a fixed field length. Like- 
wise, the token itself is delimited by a character not in 
the character set of the ending field string. 

While the invention has been particularly shown and 
described with reference to a preferred embodiment 
thereof, it will be understood by those skilled in the art 
that various other changes in the form and details may 
be made therein without departing from the spirit and 
scope of the invention. Accordingly, the method and 
system herein disclosed are to be considered merely as 
illustrative and the invention is to be limited only as 
specified in the claims. 

We claim: 

1. A computer system providing a computerized 
token identification system using tokens generated by 
said computer system for uniquely representing a plu- 
rality of items, said computer system comprising: 

a plurality of unique tokens; 

computer generation means for generating one of said 
plurality of unique tokens; 

computer assignment means for associating said one 
of said plurality of unique tokens with an item; 

computer recognition means for recognizing said one 
of said plurality of unique tokens; and 

computer identification means for identifying the 
item associated with said one of said plurality of 
unique tokens, 

wherein each of said plurality of unique tokens com- 
prises: 

a delimiter field containing a token recognition 
character; 

a version field following the delimiter field contain- 
ing a version string of at least one character, 
identifying a unique token version; and 

a variable field adjacent said version field, contain- 
ing a variable string oiF at least one character, 
conforming to a format specification for said 
unique token version, each variable string being 
unique for a unique token version. 

2. The system of claim 1 wherein said version field is 
adjacent said delimiter field. 

3. The system of claim 2 wherein said version string 
is of varying lengths. 

4. The system of claim 3 wherein the characters of 
said version string and said variable string that are adja- 
cent are from different character set types. 

5. The system of claim 4 wherein said variable string 
is of varying lengths. 

6. The system of claim 1 wherein said variable field 
comprises: 

a generator identifier field, adjacent said version field, 
containing a generator identifier string of at least 
one character, identifying a unique token genera- 
tor; 

a verification field, adjacent said generator identifier 
field, containing a verification character; and 

an identification field, adjacent said verification field, 
containing an identification string of at least one 
character, wherein said identification string is 
unique for a unique token generator. 

7. The system of claim 6 wherein the generator identi- 
fier string and the identification string are of variable 
lengths. 
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8. The system of claim 7 wherein the characters of the 
generator identifier string and the identification string 
that are adjacent are from different character sets. 

9. A computer system providing a computerized 
token identification system using tokens generated by 5 
said computer system for uniquely representing a plu- 
rality of. items, said computer system uoiuprising: 

a plurality of unique tokens; 

computer generation means for generating one of said 
plurality of unique tokens; 10 

computer assignment means for associating said one 
of said plurality of unique tokens with an item; 

computer recognition means for recognizing said one 
of said plurality of unique tokens; and 

computer identification means for identifying the 15 
item associated with said one of said plurality of 
unique tokens, 

wherein each of said plurality of unique tokens com- 
prises: 

a delimiter field containing at least one token rec- 20 
ognition character; 

a version field immediately following the delimiter 
field containing a version string of varying 
length having at least one character, identifying 
a unique token version; and 25 

a variable field immediately following said version 
field, containing a variable string of a varying 
length of at least one character, conforming to a 
format specification for said unique token ver- 
sion, each variable string being unique for a 30 
unique token version, wherein the characters of 
said version string and said variable string that 
are adjacent are from different character set 
types. 

10. The system of claim 9 wherein said variable field 35 
comprises: 

a generator identifier field, adjacent said version field, 
containing a generator identifier string of at least 
one character, identifying a unique token genera- 
tor; 40 

a verification field, adjacent said generator identifier 
field, containing a verification character; and 

an identification field, adjacent said verification field, 
containing an identification string of at least one 
character, wherein said identification string is 45 
unique for a unique token generator, 

wherein the generator identifier string and the identi- 
fication string are of varying lengths, and the char- 
acters of the generator identifier string and the 

50 



identification string that are adjacent are from dif- 
ferent character sets. 

11. In a data processing system having a processor 
and a memory, a computerized token identification 
system using tokens generated by said computer system 
for uniquely representing a plurality of items, each of 
said tokens having a token structure consisting of a 
plurality of fields, said token structure comprising: 

a delimiter field containing a token recognition char- 
acter, wherein said delimiter field is one of said 
plurality of fields; 

a version field following the delimiter field contain- 
ing a version string of at least one character, identi- 
fying a unique token version, wherein said version 
field is one of said plurality of fields; and 

a variable field adjacent said version field, containing 
a variable string of at least one character, conform- 
ing to a format specification for said unique token 
version, each variable string being unique for a 
unique token version, wherein said variable field is 
one of said plurality of fields. 

12. The system of claim 11 wherein said version field 
is adjacent said delimiter field. 

13. The system of claim 12 wherein said version 
string is of varying lengths. 

14. The system of claim 13 wherein the characters of 
said version string and said variable string that are adja- 
cent are from different character set types. 

15. The system of claim 14 wherein said variable 
string is of varying lengths. 

16. The system of claim 11 wherein said variable field 
comprises: 

a generator identifier field, adjacent said version field, 
containing a generator identifier string of at least 
one character, identifying a unique token genera- 
tor; 

a verification field, adjacent said generator identifier 
field, containing a verification character; and 

an identification field, adjacent said verification field, 
containing an identification string of at least one 
character, wherein said identification string is 
unique for a unique token generator. 

17. The system of claim 16 wherein said generator 
identifier string and said identification string are of 
varying lengths, and the characters of the generator 
identifier string and the identification string that are 
adjacent are from different character sets. 
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